The Personal Name Problem and a Data Mining Solution

نویسندگان

  • Clifton Phua
  • Vincent Cheng-Siong Lee
  • Kate Smith-Miles
چکیده

Almost every person has a life-long personal name which is officially recognised and has only one correct version in their language. Each personal name typically has two components/parts: a first name (also known as given, fore, or Christian name) and a last name (also known as family name or surname). Both these name components are strongly influenced by cultural, economic, historical, political, and social backgrounds. In most cases, each of these two components can have more than a single word and the first name is usually gender-specific. (see Figure 1). There are three important practical considerations for personal name analysis: • Balance between manual checking and analytical computing. Intuitively, a small proportion of names should be manually reviewed, the result has to be reasonably accurate, and each personal name should not take too long to be processed. • Reliability of the verification data has to be examined. By keeping the name verification database’s updating process separate from incoming names, it can prevent possible data manipulation/corruption over time. However, the incompatibility of names in databases can also be caused by genuine reasons as such as cultural and historical traditions, translation and transliteration, reporting and recording variations, and typographical and phonetic errors (Borgman and Siegfried, 1992).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Personal Credit Score Prediction using Data Mining Algorithms (Case Study: Bank Customers)

Knowledge and information extraction from data is an age-old concept in scientific studies. In industrial decision-making processes, the application of this concept gives rise to data-mining opportunities. Personal credit scoring is an ever-vital tool for banking systems in order to manage and minimize the inherent risks of the financial sector, thus, the design and improvement of credit scorin...

متن کامل

A Improved Privacy Preserving Algorithm Us- Ing Association Rule Mining in Centralized Da- Tabase

The recent advancement in data mining technology to analyze vast amount of data has played an important role in several areas of Business processing. Data mining also opens new threats to privacy and information security if not done or used properly. The main problem is that to hide sensitive information, including personal information, fact or even patterns which are generated by any algorithm...

متن کامل

A Comprehensive Study of Several Meta-Heuristic Algorithms for Open-Pit Mine Production Scheduling Problem Considering Grade Uncertainty

It is significant to discover a global optimization in the problems dealing with large dimensional scales to increase the quality of decision-making in the mining operation. It has been broadly confirmed that the long-term production scheduling (LTPS) problem performs a main role in mining projects to develop the performance regarding the obtainability of constraints, while maximizing the whole...

متن کامل

a swift heuristic algorithm base on data mining approach for the Periodic Vehicle Routing Problem: data mining approach

periodic vehicle routing problem focuses on establishing a plan of visits to clients over a given time horizon so as to satisfy some service level while optimizing the routes used in each time period. This paper presents a new effective heuristic algorithm based on data mining tools for periodic vehicle routing problem (PVRP). The related results of proposed algorithm are compared with the resu...

متن کامل

A genetic algorithm approach for open-pit mine production scheduling

In an Open-Pit Production Scheduling (OPPS) problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA) is a well-known member of evolutionary algorithms...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009